add setting to define filename pattern for part exports#1490
add setting to define filename pattern for part exports#1490arthurpassos wants to merge 4 commits intoantalya-26.1from
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d69971b4f2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| manifest.parquet_parallel_encoding = json->getValue<bool>("parquet_parallel_encoding"); | ||
| manifest.max_bytes_per_file = json->getValue<size_t>("max_bytes_per_file"); | ||
| manifest.max_rows_per_file = json->getValue<size_t>("max_rows_per_file"); | ||
| manifest.filename_pattern = json->getValue<String>("filename_pattern"); |
There was a problem hiding this comment.
Preserve manifest backward compatibility for filename_pattern
Deserialization now requires filename_pattern unconditionally, but metadata written by earlier versions does not include this key. Any node that reads an older exports/.../metadata.json (for example while checking existing exports or canceling an export in StorageReplicatedMergeTree) will throw during fromJsonString, breaking in-flight export management after upgrade. Make this field optional and fall back to the default pattern when absent.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Nobody is using this feature yet, it is ok to introduce backwards incompatible changes like this. We literally have 0 users so far.
| - **Type**: `String` | ||
| - **Default**: `{part_name}_{checksum}` | ||
| - **Description**: Pattern for the filename of the exported merge tree part. The `part_name` and `checksum` are calculated and replaced on the fly. Additional macros are supported. | ||
|
|
There was a problem hiding this comment.
Why do we duplicate part_export.md content here?
There was a problem hiding this comment.
Well, this is export partition (slightly different feature), and at some point there might be settings that are not supported by export partition and only by export part.
I don't have a good answer tbh.
| Macros::MacroExpansionInfo macro_info; | ||
| macro_info.table_id = storage_id; | ||
| filename = local_context->getMacros()->expand(filename, macro_info); | ||
|
|
There was a problem hiding this comment.
Why do we need special logic from {part_name} and {checksum}?
In other words, why we do not put it inside expand() ?
There was a problem hiding this comment.
Because part_name and checksum are calculated on the fly based on the data part being exported. They are not meant to be extracted from macros, it would not even work tbh
|
test_export_replicated_mt_partition_to_object_storage/test.py::test_export_partition_from_replicated_database_uses_db_shard_replica_macros test failure could be related to this PR. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add setting to define filename pattern for part exports - helps with sharding - port of unmerged and unreviewed PR #1383
Documentation entry for user-facing changes
...
CI/CD Options
Exclude tests:
Regression jobs to run: